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Coronaviruses, such as severe acute respiratory syndrome coronavirus and Middle East respiratory syndrome coronavirus, pose 
significant public health threats. Bats have been suggested to act as natural reservoirs for both these viruses, and periodic mon- 
itoring of coronaviruses in bats may thus provide important clues about emergent infectious viruses. The Eastern bent-wing bat 
Miniopterus fuliginosus is distributed extensively throughout China. We therefore analyzed the genetic diversity of corona- 
viruses in samples of M. fuliginosus collected from nine Chinese provinces during 2011-2013. The only coronavirus genus 
found was Alphacoronavirus. We established six complete and five partial genomic sequences of alphacoronaviruses, which 
revealed that they could be divided into two distinct lineages, with close relationships to coronaviruses in Miniopterus mag- 
nater and Miniopterus pusillus. Recombination was confirmed by detecting putative breakpoints of Lineage 1 coronaviruses in 
M. fuliginosus and M. pusillus (Wu et al., 2015), which supported the results of topological and phylogenetic analyses. The es- 
tablished alphacoronavirus genome sequences showed high similarity to other alphacoronaviruses found in other Miniopterus 
species, suggesting that their transmission in different Miniopterus species may provide opportunities for recombination with 
different alphacoronaviruses. The genetic information for these novel alphacoronaviruses will improve our understanding of 
the evolution and genetic diversity of coronaviruses, with potentially important implications for the transmission of human 


diseases. 
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INTRODUCTION lope (E), and membrane (M) glycoproteins, and the nucle- 


Coronaviruses (CoVs; order Nidovirales, family Coro- 
naviridae, subfamily Coronavirinae) are enveloped RNA 
viruses with unusually large, positive-stranded RNA ge- 
nomes of 26-32 kb (Lai, 2001). The viral genome contains 
five major open reading frames (ORFs) that encode the rep- 
licase polyproteins (ORFla and ORFI1b), spike (S), enve- 
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ocapsid protein (N) (Gonzalez et al., 2003; Holmes and 
Enjuanes, 2003). According to a proposal submitted to the 
International Committee on the Taxonomy of Viruses, 
CoVs can be classified into four genera, Alphacoronavirus, 
Betacoronavirus, Gammacoronavirus, and Deltacorona- 
virus, which replace the traditional CoV groups 1, 2, and 3 
(King et al., 2011; Woo et al., 2009, 2012). CoVs are 
known to cause upper and lower respiratory diseases, gas- 
troenteritis, and central nervous system infections in a 
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number of avian and mammalian hosts, including humans 
(Weiss and Navas-Martin, 2005). Bats have been increas- 
ingly recognized as important natural reservoirs for CoVs. 
In particular, previously unknown CoVs related to severe 
human pathogens, such as severe acute respiratory syn- 
drome (SARS) CoV (Li et al., 2005) and Middle East res- 
piratory syndrome CoV (van Boheemen et al., 2012), were 
discovered in bats from China and other countries, with 
consequent recent increases in research into the biodiversity 
and genomics of CoVs in different bat species. 

The diversity of CoVs arises from the infidelity of 
RNA-dependent RNA polymerase (RdRp), the high fre- 
quency of recombination, and the large genomes of CoVs 
(Woo, 2009). These factors have generated diverse strains 
and genotypes of the CoV lineage, and have given rise to 
new lineages able to adapt to new hosts. These new lineages 
have occasionally caused major zoonotic outbreaks with 
disastrous consequences (Woo, 2006). 

A previous study reported the detection of several novel 
bat CoVs (BtCoVs) in Miniopterus magnater and Miniop- 
terus pusillus from Hong Kong (Chu et al., 2008), and in 
Miniopterus fuliginosus from Japan (Shirato et al., 2012). 
However, despite being the most extensively distributed 
Miniopterus species in China, the CoVs harbored by M. 
fuliginosus (the Eastern bent-wing bat) have not been sys- 
tematically studied. M. fuliginosus are known to migrate 
long distances and typically roost with large numbers of 
bats from different genera, including Rhinolophus, Hippo- 
sideros, and Myotis (Cui et al., 2007; Miller-Butterworth et 
al., 2003), which habits may facilitate viral exchange be- 
tween different bat species. Furthermore, our understanding 
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of the diversity of CoVs in the genus Miniopterus remains 
limited. We therefore launched a survey to determine the 
dynamics and prevalence of CoVs in M. fuliginosus living 
in different geographical regions. In the current study, we 
explored the genetic diversity of CoVs in M. fuliginosus in 
China by analyzing 194 bat samples collected from nine 
Chinese provinces during 2011-2013. 


RESULTS 


Bat surveillance and identification of CoVs 


A total of 194 M. fuliginosus bats were captured in nine 
provinces of China from October 2010 to October 2013, and 
pharyngeal and anal swabs were collected (Figure 1). All 
sampling sites were in or close to human gathering places. 
Only the anal swab samples harbored CoVs according to 
single-strain screening with conserved primers, and the pos- 
itivity rates for each province are shown in Figure 1. Se- 
quence analysis of the PCR amplicons identified al- 
pha-CoV-positive bats in six provinces (Guangdong, Hubei, 
Fujian, Henan, Anhui, and Jiangxi), but no other CoV gen- 
era were found. Interestingly, co-infections with different 
CoVs were detected in two M. fuliginosus anal specimens; 
one from Guangdong and one from Henan. 

We selected samples positive for CoVs that were repre- 
sentative of each province for genomic sequencing and es- 
tablished the complete genomic sequences of six alpha- 
CoVs: BtMf-AlphaCoV/Guangdong2012 (GD), BtMf- 
AlphaCoV/Hubei2013 (HB), BtMf-Alpha CoV/Fujian2012 
(FJ), BtMf-AlphaCoV/Henan2013 (HN), BtMf-AlphaCoV/ 


Province 11/L2/total samples 


Shaanxi 0/0/26 

Henan 4/4/42 
Anhui 0/3/30 
Hubei 2/3/16 
Zhejiang 0/0/8 

Jiangxi 0/2/18 
Fujian 2/0/26 


Yunnan 0/0/8 
Guangdong 2/6/20 


Figure 1 The nine provinces (indicated in blue) in China, where bats were captured, and samples were collected. The numbers on the right indicate the 
numbers of samples positive for Lineage | (L1) and Lineage 2 (L2) and the total number of samples collected in each province. The red shading on Guang- 
dong and Henan indicate the regions where co-infections of two lineages were detected. 
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Anhui2011 (AH), and BtMf-AlphaCoV/Jiangxi2012 (JX). 
We also established partial genomic sequences of five other 
alpha-CoVs: BtMf-AlphaCoV/Guangdong2012-a (GD-a), 
BtMf-AlphaCoV/Guangdong2012-b (GD-b), BtMf-Alpha- 
CoV/ Hubei2013-a (HB-a), BtMf-AlphaCoV/Henan2013-a 
(HN-a), and BtMf-AlphaCoV/Henan2013-b (HN-b). The 
GD and GD-b sequences were identified in the same sample 
from Guangdong, and the HN and HN-b sequences were 
identified in the same sample from Henan. 


Genomic sequences 


The sizes of the BtCoVs GD, HB, FJ, HN, AH, and JX ge- 
nomes, excluding the 3’ poly(A) tails, were 28,748, 28,745, 
28,755, 28,725, 28,300, and 28,301 nt, respectively, with 
G+C contents of 41.8%, 41.85%, 41.87%, 41.98%, 38.17%, 
and 38.19%, respectively. The genomic organization of 
these CoVs was similar to that of other alpha-CoVs (Table 
1). The main difference among genomes was in ORF7, 
which was present in GD, HB, FJ, and HN, but absent in 
AH and JX. We then compared the complete genomes (Ta- 
ble 2). The full-length genomic sequences of HB, FJ, and 
HN showed 91.9%-97.0% nt identities with each another, 
and lower identity with the GD genome (82.1%-85.7%). In 


Table 1 Predicted ORFs in the genomes of bat CoVs” 
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contrast, AH and JX exhibited 96.2% overall nt identity 
with each other, and lower identities with the other four 
genomes (68.0%-68.8%). The sizes of the 5’ untranslated 
regions of GD, HB, FJ, HN, AH, and JX were 270, 269, 
268, 268, 272, and 273 nt, respectively. The core sequences 
of the leader transcription regulatory sequence (TRS; 5’- 
CUAAAC-3’) were identified in the 5’ untranslated se- 
quences (Table 3). The TRSs of ORF3 and the E genes in 
AH and JX differed from those of the other four CoVs. The 
TRS of ORF7 in FJ and GD (CUGAAU) differed by 1 nt 
from that in HB and HN (CUGAAC). Apart from ORF3, E, 
and ORF7, the TRSs for the other ORFs were predicted in 
these six CoV genome sequences. 

ORFlab occupied approximately 70% of the genome, 
and consisted of ORFla and ORF1b, encoding viral poly- 
protein la (ppla) and pplb, respectively. Putative features 
responsible for ribosomal frame shifting, e.g. the “slippage 
sequence” (5'-UUUAAAC-3’), were predicted in the ge- 
nomes. ORFla of AH and JX shared 98.5% aa identity, but 
lower (63.0%-63.8%) aa identity with the other four CoVs, 
while the ORFla sequences of HB, FJ, and HN showed 
99.2%-99.5% aa identity, but lower (87.5%-87.6%) aa 
identity with GD. The ORF1b sequences exhibited the same 


GD HB FJ HN AH JX 
Onr Position oe Position pee Position ie Position ae Position ne Position eee 
ORF la 271-12,966 12,693 270-12,944 12,672 269-12,943 12,672 269-12,943 12,672 273-13,076 12,801 274-13,077 12,801 
ORF1b 12,936-20,960 8,022 12,914-20,938 8,022 12,913-20,937 8,022 12,913-20,937 8,022 13,046-21,067 8,019 13,047-21,068 8,019 
NSPI1 271-600 330 270-599 330 269-598 330 269-598 330 273-599 327 274-600 327 
NSP2 601-2,943 2,343 600-2,942 2,343 599-2,941 2,343 599-2,941 2,343 600-2,951 2,352 601-2,952 2,352 
NSP3 = -2,944-8,175 5,232 2,943-8,153 5,211 2,942-8,152 5,211 2,942-8,152 5,211 2,952-8,288 5,337 2,953-8,289 5,337 
NSP4 —8,176-9,600 1,425 8,154-9,578 1,425 8,153-9,577 1,425  8,153-9,577 1,425  8,289-9,710 1,422 8,290-9,711 1,422 
NSPS  9,601-10,506 906 9,579-10,484 906 9,578-10,483 906 9,578-10,483 906 9,711-10,616 906 9,712-10,617 906 
NSP6 10,507-11,343 837 10,485-11,321 837 10,484-11,320 837 10,484-11,320 837 10,617-11,453 837 10,618-11,454 837 
NSP7 11,344-11,592 249 11,322-11,570 249 11,321-11,569 249 11,321-11,569 249 11,454-11,702 249 11,455-11,703 249 
NSP8 11,593-12,174 582 11,571-12,152 582 11,570-12,151 582 11,570-12,151 582 11,703-12,284 582 11,704-12,285 582 
NSP9 12,175-12,504 330 12,153-12,482 330 12,152-12,481 330 12,152-12,481 330 12,285-12,614 330 12,286-12,615 330 
NSP1O 12,505-12,912 408 12,483-12,890 408 12,482-12,889 408 12,482-12,889 408 12,615-13,022 408 12,616-13,023 408 
NSP11 12,913-12,966 54 12,891-12,944 54 12,890-12,943 54 12,890-12,943 54 13,023-13,076 54 13,024-13,077 54 
NSP12 12,913-15,692 2,781 12,891-15,670 2,781 12,890-15,669 2,781 12,890-15,669 2,781 13,023-15,802 2,781 13,024~-15,803 2,781 
NSP13 15,693-17,483 1,791 15,671-17,461 1,791 15,670-17,460 1,791 15,670-17,460 1,791 15,803-17,584 1,782 15,804-17,585 1,782 
NSP14 17,484-19,040 1,557 17,462-19,018 1,557 17,461-19,017 1,557 17,461-19,017 1,557 17,585-19,147 1,563 17,586-19,145 1,560 
NSP15 19,041-20,057 1,017 19,019-20,035 1,017 19,018-20,034 1,017 19,018-20,034 1,017 19,148-20,164 1,017 19,146-20,165 1,020 
NSP16 20,058-20,960 900 20,036-20,938 900 20,035-20,937 900 20,035-20,934 900 20,165-21,067 900 20,166-21,068 900 
S 20,962-25,098 4,134 20,935-25,059 4,122 20,939-25,075 4,134 20,939-25,075 4,134 21,069-25,196 4,125 21,070—25,200 4,128 
ORF3 25,098-25,766 666 25,059-25,727 666 25,075-25,743 666 25,075-25,743 666 25,196-25,855 657 25,200-25,859 657 
E 25,750-25,974 222 25,711-25,935 222 25,727-25,951 222 25,727-25,951 222 25,849-26,073 222 25,853-26,077 222 
M _~ 25,984—26,742 756 25,945-26,709 762 25,961-26,719 756 25,961-26,719 756 26,080-26,841 759 26,084-26,842 756 
N —_.26,791-28,059 1,266 26,758-28,026 1,266 26,768-28,036 1,266 26,768-28,036 1,266 26,862-28,031 1,167 26,863-28,032 1,167 
ORF7a 27,809-27,979 168 27,776-28,522 744 27,786-28,532 744 27,786-28,505 717 
ORF7b 28,034—28,528 492 


a) BtMf-AlphaCoV/Guangdong2012 (GD), BtMf-AlphaCoV/Hubei2013 (HB), BtMf-AlphaCoV/Fujian2012 (FJ), BtMf-AlphaCoV/Henan2013 (HN), 
BtMf-AlphaCoV/Anhui2011 (AH), and BtMf-AlphaCoV/Jiangxi2012 (JX). 


Table 2 Percent nucleotide identity between whole genomes and percent amino acid similarities between viral protein sequences in bat CoVs” 
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; i : Lineage 1 Lineage 2 
Nucleotide or protein Virus 

GD HB FJ HN AH JX 1A 

HKU8 91.8 86.1 82.2 81.6 67.7 67.6 67.7 

GD ~ 82.1 85.4 85.7 68.6 68.5 68.5 

HB _ ~ 92.8 91.9 68.1 68.0 68.0 

Genome FJ - - - 97.0 68.8 68.8 68.8 
HN ~ = - - 68.7 68.7 68.6 

AH - - - - - 96.2 96.2 

JX - - - - - - 96.0 

HKU8 99.0 87.2 87.1 87.3 63.4 63.4 63.0 

GD - 87.6 87.5 87.6 63.5 63.5 63.2 

HB ~ - 99.2 99.5 63.6 63.7 63.3 

ORF la FJ = - - 99.3 63.7 63.7 63.3 
HN = - - - 63.6 63.6 63.2 

AH - - = - - 98.5 97.7 

JX ~ - - - - - 98.4 

HKU8 99.6 98.2 98.2 98.2 87.9 87.7 87.4 

GD - 98.3 98.2 98.3 88.0 87.8 87.5 

HB ~ = 99.8 99.8 88.0 87.8 87.5 

ORF 1b FJ = - - 99.9 87.9 87.7 87.4 
HN - ~ - - 87.9 87.7 87.4 

AH - - - - - 99.8 99.4 

JX - - - - - - 99.3 

HKU8 99.8 97.1 97.1 97.0 90.1 89.9 90.0 

GD - 97.1 97.1 97.0 90.1 89.9 90.0 

HB - - 100.0 99.9 90.2 90.0 90.1 

RDRP FJ ~ - - 99.9 90.2 90.0 90.1 
HN - = = = 90.1 89.9 90.0 

AH - - - - - 99.8 99.9 

JX - - - - - - 99.7 

HKU8 52.9 95.7 53.5 53.5 49.0 48.4 49.1 

GD - 52.5 87.8 87.5 61.0 60.7 60.6 

HB ~ ~ 52.7 52.8 49.1 48.6 49.2 

S FJ = - - 98.0 60.7 59.6 60.5 
HN - ~ - - 60.9 59.6 60.6 

AH ~ - - = - 93.2 93.2 

JX - - - - - - 91.6 

HKU8 97.8 98.2 97.8 97.3 46.3 46.3 46.3 

GD - 99.6 99.1 98.7 46.3 46.3 46.3 

HB - ~ 99.6 99.1 46.3 46.3 46.3 

ORF3 FJ ~ ~ - 99.6 46.3 46.3 46.3 
HN - - - - 46.3 46.3 46.3 

AH - - - - - 99.5 99.1 

JX ~ ~ - - _ - 98.6 

HKU8 98.7 98.7 98.7 98.7 70.7 70.7 70.7 

GD - 100.0 100.0 100.0 70.7 70.7 70.7 

HB - - 100.0 100.0 70.7 70.7 70.7 

E FJ - - - 100.0 70.7 70.7 70.7 
HN - = - - 70.7 70.7 70.7 
AH = = - - - 100.0 100.0 
JX = - - = = = 100.0 

HKU8 85.6 85.3 85.6 85.6 72.2 72.5 73.0 

GD - 93.7 99.6 99.2 73.3 73.6 73.1 

HB = — 93.7 93.7 71.5 71.8 72.9 

M FJ - - - 99.6 73.3 73.6 73.1 
HN ~ = - - 72.9 73.2 73.1 

AH ~ - - - - 99.6 93.3 

JX ~ - - - = - 93.7 


(To be continued on the next page) 
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(Continued) 
. . . Lineage 1 Lineage 2 
Nucleotide or protein Virus 
GD HB FJ HN AH JX 1A 
HKU8 93.9 88.9 88.2 87.9 64.3 64.1 64.3 
GD - 91.5 90.3 90.1 63.8 63.6 63.8 
HB - - 98.6 97.9 65.9 65.6 65.6 
N FJ - - - 98.3 66.1 65.9 65.9 
HN - - - ~ 65.6 65.4 65.4 
AH - - - - - 99.7 98.7 
JX - - - - - - 99.0 
HKU8 61.0 84.7 84.8 59.0 
GD - 61.3 61.0 96.5 
ORF7 HB - - 97.9 61.7 
FJ - - - 63.0 
HN - - - - 


a) BtMf-AlphaCoV/Guangdong2012 (GD), BtMf-AlphaCoV/Hubei2013 (HB), BtMf-AlphaCoV/Fujian2012 (FJ), BtMf-AlphaCoV/Henan2013 (HN), 
BtMf-AlphaCoV/Anhui2011 (AH), and BtMf-AlphaCoV/Jiangxi2012 (JX), HKU8, and 1A. 


tendencies in terms of sequence similarities. Based on a 
previous analysis, the ppla and pp1b proteins were predict- 
ed to be cleaved by virus proteases to produce a total of 16 
nonstructural proteins (NSPs) (Chen et al., 2003). ORFlab 
in GD, HB, FJ, HN, AH, and JX CoVs contained functional 
units typical of CoVs (Table 1), including RdRps in the 
NSP12 region. RdRp is a highly conserved CoV protein that 
is frequently used for phylogenetic comparisons. Six CoV 
genome sequences had RdRps genes of the same size (2781 
nt). aa-sequence identity analyses of the RdRp proteins (Ta- 
ble 2) suggested that the six alpha-CoVs could be divided 
into two lineages: Lineage 1, including GD, HB, FJ, and 
HN, which shared 97%-100% aa identity, and Lineage 2, 
including AH and JX, which were closely related to each 
other (99.8% aa identity) and showed lower (89.9%-90.2%) 
aa identity with Lineage 1 CoVs. 

Comparison of the aa sequences of the seven conserved 
replicase domains or NSPs (ADP-ribose-1'-phosphatase, 
NSP5 (3CL?), NSP12 (RdRp), NSP13 (Hel), NSP14 (3'> 
5’ exonuclease; (guanine-N7)-methyltransferase), NSP15 
(nidoviraluridylate-specific endoribonuclease), and NSP16 
(2'-O-ribose methyltransferase) for CoV species demarca- 
tion (de Groot, 2011) showed that Lineage | and Lineage 2 
possessed <90% aa-sequence identity with each other, and 
BtCoV-HKU8 showed high aa identities (87.9%—93.9%) in 
terms of N protein with other Lineage 1 CoVs (GD, FJ, HB, 
HN). The N protein aa identities between the Lineage 2 
CoVs AH, JX and BtCoV-1A, BtCoV-1B were 98.7%-99% 
and 91.6%-91.9%, respectively, indicating that Lineage 1 
and Lineage 2 represented different species of Alphacoro- 
navirus. 

The most striking differences among CoVs were ob- 
served in the S protein sequence. The S gene sequence had 
five nts (AAAAU) inserted between the TRS and AUG in 
all CoVs except HB CoV (Table 3). Interestingly, the S 
protein (1,378 aa) was the same size in all members of Lin- 
eage 1, except HB (1,374 aa). However, the HB S protein 


shared only about 52.5%-52.8% aa identities with the S 
proteins of other Lineage 1 CoVs. Among the other Lineage 
1 CoVs, the S proteins of FJ and HN were 98.0% identical, 
but they shared only 87.5% and 87.8% aa identity, respec- 
tively, with GD. In Lineage 2, AH and JX S proteins were 
93.2% identical. Notably, the S proteins of GD, FJ, and HN 
in Lineage 1 appeared to be more closely related to the S 
proteins of Lineage 2 CoVs (59.6%-61.0%) than to the S 
protein of HB (52.5%-52.8%). Inter-ProScan analysis pre- 
dicted that all six CoVs included type I membrane glyco- 
proteins, where most of the protein (prior to residues 
1318/1319/1322) was exposed on the outside of the viral 
capsule, and the C terminus comprised a transmembrane 
domain (residues 1319/1320/1323-1341/1342/1345), fol- 
lowed by the internal region in the virion, which was rich in 
cysteine residues. The S protein responsible for virus entry 
was divided into two domains; the $1 domain involved in 
receptor binding and the S2 domain for cellular membrane 
fusion. The putative Sl region was located at residues 
229-741 for HB; 227-739 for GD and AH, 228-740 for JX, 
and 224-739 for FJ and HN. The diversity of S proteins was 
mainly within the S1 domain. HB S1 showed 93.3% aa 
identity with BtCoV-HKU8 and 39.6%—41.5% with other 
Lineage 1 and Lineage 2 CoVs. AH shared high aa identi- 
ties with Lineage 2 CoVs in the SI region (86.8%-93.7%), 
and GD had 85.1%-85.7% aa identities with FJ and HN. 
Analysis of the aa identities of the S1 region were consistent 
with the phylogenetic trees for the whole S region (Figure 
2). S2 included two putative heptad repeat regions, im- 
portant for membrane fusion and viral entry (Bosch et al., 
2003), located at residues 977-1122 and 1264—1320 in GD, 
FJ, and HN, 975-1120 and 1260-1316 in HB, and 
973/974-1122/1123 and 1252/1253-1311/1312 in AH and 
JX. 

ORF3, which encoded putative 222-aa and 219-aa pro- 
teins in Lineage 1 and Lineage 2 CoVs, respectively, was 
located between the S and E sequences in all six genomes. 
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ORF TRS CoV TRS sequence Pelee 
position 
GD CUCAA|CUAAACIGAAAU 69 
HB CUCAA|CUAAACIGAAAU 68 
Leader FJ CUCAA|CUAAACIGAAAU 67 
TRS HN CUCAAICUAAAC|GAAAU 67 
AH CUCAA|CUAAAC|GAAAU 68 
JX CUCAA|CUAAACIGAAAU 69 
GD UUCAAICUAAAUIAAAAUG 20,953 
HB UUCAA|ICUAAAUG 20,931 
FJ UUCAAICUAAAUAAAAUG 20,930 
: HN UUCAAICUAAAUAAAAUG 20,930 
AH UUCAAICUAAAUAAAAUG 21,060 
JX UUCAAICUAAAUAAAAUG 21,061 
GD UACAAICAAUACGAAGUN?, AUG 25,066 
HB UACAAICAAUACGAAGUN,, AUG 25,027 
FJ UACAAICAAUACGAAGUN?, AUG 25,043 
vee HN UACAAICAAUACGAAGUN?, AUG 25,043 
AH UACAAICGUUACGAAAUN?, AUG 25,164 
JX UACAAICGUUACGAAAUN?, AUG 25,168 
GD UACAAICUCUACGAAGAUG 25,740 
HB UACAAICUCUACGAAGAUG 25,701 
FJ UACAAICUCUACGAAGAUG 25,717 
e HN UACAAICUCUACGAAGAUG 25,717 
AH UUCAAICUACACGAAGAUG 25,839 
JX UUCAAICUACACGAAGAUG 25,843 
GD GAUGU|ICUAAACGAACAAAAUG 25,971 
HB GAUGU|ICUAAACGAACAAAAUG 25,932 
FJ GAUGU|ICUAAACGAACAAAAUG 25,948 
i HN GAUGU|ICUAAACGAACAAAAUG 25,948 
AH AAUGUICUAAACGAGAAUG 26,070 
JX AAUGUICUAAACGAGAAUG 26,074 
GD AUAAAICUAAAC/AAGUGN3.AUG 26,744 
HB AUAAA|CUAAACAAGUGN3.AUG 26,711 
x FJ AUAAA|CUAAACAAGUGN3AUG 26,721 
HN AUAAAICUAAACAAGUGN;,AUG 26,721 
AH UUAAA|CUAAACAAGAANsAUG 26,843 
JX UUAAA|CUAAACAAGAANsAUG 26,844 
GD GAUUG|ICUGAAUJUGCUANssAUG 27,710 
GRE HB AAUUGICUGAACIUGAUUNsg:AUG 27,677 
FJ AAUUGICUGAAUJUGAUUNgsAUG 27,687 
HN AAUUGICUGAACIUGAUCNssAUG 27,687 
a) For putative ORFs, we aligned the TRS that preceded the start codon AUG with the leader TRS. The core sequence is indicated in a box. The start co- 


dons of genes are in bold type. 


The aa sequences of ORF3 were highly conserved within 
Lineages 1 and 2 (98.7%-99.6% and 99.5%, respectively), 
but varied between lineages (46.3%). Among the CoV pro- 
teins, ORF3 showed the greatest inter-lineage diversity. 
Multiple transmembrane motifs were predicted in ORF3 
proteins, suggesting that they might be surface proteins. 
TMHMM analysis showed that Lineage 1 CoVs harbored 
three putative transmembrane domains in ORF3 (aa resi- 
dues 36-58, 70-92, and 96-113), while Lineage 2 CoVs 
harbored only two putative transmembrane domains (aa 
residues 37-59 and 71-93). 


The E, M, and N proteins were highly conserved within 
CoVs of the same lineage (>90% identity) and were diverse 
between lineages (63.6%—73.6%). ORF7 was located at the 
3' end of the Lineage | virus genome, and overlapped with 
the N gene. ORF7 encoded a putative NSP of 239-248 aa 
residues in FJ, HN, and HB. Interestingly, ORF7 in GD 
possessed two small ORFs, encoding putative proteins of 56 
and 164 aa residues, respectively (Table 1). 


Phylogenetic analyses 


We performed phylogenetic analyses based on the aa se- 
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quences of the RdRp, S, E, M, and N proteins of these 
BtCoVs, including the RdRp and S proteins in the five par- 
tial CoV sequences (GD-a, GD-b, HB-a, HN-a, and HN-b). 
Phylogenetic trees were constructed using MEGAS.0 soft- 
ware, based on the deduced aa sequences. Several reference 
CoV genome sequences were downloaded from GenBank 
and aligned with the fragments of the newly discovered 
CoVs (Figure 2). The results of the phylogenetic analyses 
were consistent with those of the sequence identity anal- 
yses, and confirmed that the newly identified alpha-CoVs 
could be divided into two lineages. The aa sequences of the 
RdRp, E, M, and N proteins in Lineage 1 viruses always 
clustered with BtCoV HKU8, found in M. pusillus. In con- 
trast, phylogenetic analysis based on the S proteins showed 
a different tree structure, in which GD, FJ, and HN in Line- 
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age | clustered together in a clade with Lineage 2 viruses, 
and HB and BtCoV HKU8 formed a relatively distant 
cluster, sharing 95.7% aa identity with each other and 
only 52.7%-53.5% identity with the other three Lineage 
1 CoVs. Phylogenetic analysis of the S protein thus indi- 
cated that Lineage 1 CoVs could be further divided into two 
types: type I (HB and HKU8) and type II (FJ, HN, and GD). 
According to the phylogenetic trees, Lineage 2 viruses (AH, 
JX, GD-a, HB-a, and HN-a) always clustered with BtCoV 
1A, found in M. magnater (>99.7% nt identity in RdRp 
and >91.4% aa in S protein), and GD-b and HN-b with 
BtCoV 1B, found in M. pusillus (98.7% aa identity with 
RdRp and about 92.0% with S protein). These tree branches 
were very short, reflecting the high sequence similarities. 
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Figure 2 Phylogenetic trees based on the amino acid sequences of the partial RNA-dependent RNA polymerase (RdRp; an 324-nt sequence fragment cor- 
responding to positions 14828-15151 in bat coronavirus (BtCoV-HKU8; NC010438)), full-length spike (S), envelope (E), membrane (M), and nucleocapsid 
(N) proteins. The following CoVs and GenBank accession numbers were used: BtCoV-1A (NC010437), BtCoV-1B (NC010436), BtCoV-HKU7 
(DQ249226), BtCoV-HKU2 (NC009988), BtCoV-HKU10 (NCO018871), BtCoV-512 (NC009657), BtCoV-Mf/Japan/01/2009 (AB619638), BtCoV- 
Mf/Japan/02/2009 (AB619639), BtCoV-Mf/Japan/01/2010 (AB619640), BtCoV-Mf/Japan/03/2010 (AB619642), BtCoV-A773/2005 (DQ648835), Feline 
infectious peritonitis virus (FIPV; AY994055), Canine CoV-341/05 (EU856361), BtCoV-HKU9 (EF065513), severe acute respiratory syndrome coronavirus 
(SARS-CoV; NC004718), human CoV OC43 (HCoV-OC43; NC005147), HCoV-HKU1 (NC006577), HCoV-229E (NC002645), HCoV-NL63 (NC005831), 
Middle East respiratory syndrome coronavirus (HCoV-MERS; KF192507), avian infectious bronchitis virus (IBV; NC001451), beluga whale CoV SW1 
(BWCoV; NC010646). Scale bar indicates genetic distance, estimated with a WAG+G model implemented in MEGAS (www.megasoftware.net). 
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Recombination analyses 


Co-infection with different CoVs in the same bat may create 
opportunities for recombination, potentially resulting in the 
emergence of new viruses. Co-infections with different lin- 
eages in M. fuliginosus were detected in two anal specimens 
collected in Guangdong and Henan (Wu et al., 2015). Pre- 
vious studies have shown that CoVs have a tendency to un- 
dergo RNA recombination (Herrewegh et al., 1998; Lai and 
Cavanagh, 1997; Lau et al., 2012b; Makino et al., 1986; 
Zeng et al., 2008). In this study, we found that recombinant 
events had occurred among the four Lineage | sequences 
(FJ, GD, HN, HB) and BtCoV HKU8. GD showed the 
highest degree of similarity to BtCoV HKU8 in the ORFlab 
region with an aa identity >99% (Table 2). The ORFlab 
region of GD may have originated from BtCoV HKU8 dur- 
ing a co-infection event in the same bat species. However, 
HB showed the highest degree of similarity to BtCoV 
HKU8 in the S region, with an aa identity of 95.7% (Table 
2). The S region of HKU8 may be the parental sequence of 
the equivalent region in HB. Considering the diversity of 
the S region in Lineage 1 CoVs, we analyzed possible re- 
combination events in Lineage 1 BtCoVs from different 
sites in China by detecting putative breakpoints and using 
SimPlot software (Wu et al., 2015). GARD analysis results 
were consistent with the bootscan analysis results, and three 
recombination breakpoints were found in the alignments of 
GD, HB, HN, FJ, and BtCoV HKU8 from M. pusillus (nt 
20,930, nt 26,861, and nt 28,128, respectively) (Wu et al., 
2015). The positions of the detected breakpoints corre- 
sponded to the areas of recombination. 


DISCUSSION 


In this study, we detected and characterized alpha-CoVs 
carried by M. fuliginosus bats in China. M. fuligi- 
nosus-related alpha-CoVs were detected in six different 
provinces (Guangdong, Hubei, Fujian, Henan, Anhui, and 
Jiangxi), representing the middle, eastern, and southern 
parts of China. Based on genetic and phylogenetic analyses, 
these alpha-CoVs could be classified into two distinct line- 
ages, Lineage 1 and Lineage 2. Lineage 1/Lineage 2 
co-infections were detected in two specimens collected 
from Guangdong and Henan (Wu et al., 2015). 

Lineage | and Lineage 2 CoVs showed high intra-lineage 
genomic similarities, except in the S region. This high simi- 
larity suggests each lineage shared a common ancestor. 
However, Lineage | genomes (GD, HB, FJ, and HN), iso- 
lated from Guangdong, Hubei, Fujian, and Henan provinc- 
es, presented marked differences in the S region, and phy- 
logenetic analysis of S proteins showed that Lineage 1 
CoVs formed two distinct clusters, comprising GD, FJ, and 
HN in one cluster, and HB in a relatively distant cluster. 
The same CoV in one bat species had thus evolved diverse 
S proteins in different provinces. Different environmental 
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pressures, including food availability, climate, shelter, and 
predators, may have exerted different selection pressures on 
the CoVs in the same bat species in different locations, 
leading to the emergence of a novel S protein subtype in the 
same CoV isolated from different regions. 

The S protein in CoV is responsible for receptor binding 
and host-species adaptation, and is one of the major deter- 
minants of specificity of host-species infection (Dveksler et al., 
1991; Lau et al., 2005, 2007). The S protein gene therefore 
constitutes one of the most variable regions within the CoV 
genome. GD in M. fuliginosus and BtCoV HKU8 in M. pu- 
sillus showed a higher degree of genomic similarity than 
any of the other CoVs, except in the S region. Phylogenetic 
analysis of the S protein revealed that BtCoV HKU8 clus- 
tered with HB, rather than with GD; indeed the BtCoV 
HKU8 S protein exhibited higher identity with HB than the 
other three Lineage 1 CoVs, including GD. Phylogenetic 
analysis, similarity plots, bootscan analysis, and recombina- 
tion-breakpoint analysis suggested that recombination oc- 
curred around the S region among BtCoV HKU8, GD, and 
HB (Wet al., 2015), which may have facilitated adaptation 
of the virus to a new bat species, finally leading to interspe- 
cies transmission (Graham and Baric, 2010; Song et al., 
2005). Furthermore, within the complete genome (including 
the S region), some of the established Lineage 2 CoVs (AH, 
JX, GD-a, HB-a, and HN-a) showed high similarity to 
BtCoV 1A found in M. magnater, while other Lineage 2 
CoVs (GD-b and HN-b) showed high similarity to BtCoV 
1B found in M. pusillus. Overall, bat migration and roosting 
habits provide opportunities for large numbers of bats to 
gather together (Cui et al., 2007; Woo et al., 2006a, 2006c; 
Woo, 2006), and could explain the mechanisms whereby 
Miniopterus acquires various viruses and transmits them to 
other bat species. In addition, our findings also suggested 
that the S protein had undergone varying degrees of modi- 
fication in response to the evolutionary pressure of adapting 
to a new host. 

Previous studies found that CoVs are particularly 
host-specific, though host-shifting has also been demon- 
strated (Jonassen et al., 2005; Lai, 1990; Liu et al., 2005; 
Rest and Mindell, 2003). A larger-scale study including 
different geographic regions will be necessary to confirm 
the phenomenon of host specificity. The results of the pre- 
sent study showed that a single bat species (M. fuliginosus) 
could harbor more than one species of CoV (Lineage | and 
2 CoVs), and that one CoV could be found in different spe- 
cies of bats, indicating no strict association between 
BtCoVs and bat species. The availability of genomic- 
sequence data for CoVs from bat species from different 
locations will allow analysis of the relationships between 
these viruses and the geographic distribution of their hosts. 
Further characterization of novel CoVs revealed high ge- 
netic diversity across a large geographic distribution. 
Moreover, we found that the same species of bat from dif- 
ferent geographic locations contained the same species of 
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CoV, but with distinct S proteins. 

The novel genomes described in this study represent the 
first genomic data for CoVs in M. fuliginosus bats in China. 
The results also provide the first evidence for the high di- 
versity of S proteins within a given CoV carried by the 
same bat species at different locations. This diversity most 
likely arose as a result of environmental pressures, migra- 
tion abilities, and roosting behaviors (Lau et al., 2012a). 
Conversely, highly similar CoV genomes, including similar 
or diverse S regions, were found in different bat species 
from different regions, suggesting that recombination and 
interspecies transmission may occur among BtCoVs. Re- 
combination may create opportunities for the emergence of 
new viruses that might drive CoV evolution (Vijaykrishna 
et al., 2007; Woo et al., 2006b). Previous studies demon- 
strated that SARS and a number of other new human dis- 
eases have emerged as a result of interspecies transmission 
of viruses carried by bats. The genetic features and host 
restriction of BtCoVs thus remain important subjects for 
global public health studies. Further studies and genomic 
analyses of CoVs from different Miniopterus species in dif- 
ferent regions will contribute to a better understanding of 
the diversity and evolution of CoVs, and periodic studies 
could provide genetic clues regarding potential emergent 
infectious viruses. 


MATERIALS AND METHODS 


Ethics statement 


The field studies did not involve endangered or protected 
species. Bats were treated according to the guidelines set 
out in the Regulations for the Administration of Laboratory 
Animals (Decree No. 2 of the State Science and Technology 
Commission of the People’s Republic of China, 1988). The 
sampling procedures were approved by the Ethics Commit- 
tee of the Institute of Pathogen Biology, Chinese Academy 
of Medical Sciences & Peking Union Medical College (Ap- 
proval number: IPB EC20100415). 


Bat samples 


Pharyngeal and anal swabs were collected from 194 cap- 
tured M. fuliginosus bats from nine provinces in China. No 
specific permissions were required for these procedures at 
these locations. All bats trapped for this study were released 
back into their habitat after sample collection. The bat spe- 
cies was initially determined morphologically and subse- 
quently confirmed by sequence analysis of mitochondrial 
cytochrome b DNA, as described previously (Tang et al., 
2006). The samples were immersed in maintenance medium 
in virus-sampling tubes (Yocon, China), temporarily stored 
at —20°C, and then transported to the laboratory and stored 
at —80°C. 
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RNA extraction and virus detection 


Viral RNA was extracted from the pharyngeal and anal 
swab samples using a QIAamp viral RNA minikit (Qiagen, 
Germany). Reverse transcription was performed using a 
SuperScript III kit (Invitrogen, USA). CoV screening was 
performed by amplifying a 440-bp fragment of the RdRp 
gene of CoVs using conserved primers (5'-GGTTGGG- 
ACTATCCTAAGTGTGA-3’ and 5'’-CCATCATCAGATA- 
GA-ATCATCATA-3’), as described previously (Lau et al., 
2012a, 2012b). Polymerase chain reaction (PCR) products 
were gel purified using a QIAquick gel extraction kit (Qi- 
agen). Both strands of the PCR products were sequenced 
twice with an ABI Prism 3700 DNA analyzer (Applied Bi- 
osystems, USA), using the two PCR primers. The sequences 
of the PCR products were compared with known CoV RdRp 
gene sequences in the GenBank database. After screening 
single samples with conserved primers, we confirmed the 
positivity rates of CoVs in each province (Figure 1). 


Complete genome sequencing 


We selected samples positive for CoVs that were repre- 
sentative of each province for genomic sequencing. The 
initial results revealed that they belonged to the genus AI- 
phacoronavirus and showed close relationships with 
BtCoVHKU8, 1A, or 1B. We therefore amplified the cDNA 
using degenerate primers designed by multiple alignment of 
the genomes of BtCoVHKU8 (NC010438), BtCoV1A 
(NC010437), and BtCoV1B (NCO010436). Based on the 
genetic sequences obtained, sequence-specific primers were 
used in the subsequent PCR amplifications. The primers 
used to amplify the fragments of each virus are available 
upon request. The 5’/3' ends of the viral genomes were con- 
firmed by rapid amplification of CDNA ends (RACE) using 
a 5’ RACE kit (Invitrogen) and 3’ RACE kit (TaKaRa, Ja- 
pan). For PCRs with weak or non-specific products, the 
desired DNA fragments were cloned in DNA vectors 
(pGEM-T Easy vector; Promega, USA). Multiple clones 
from a PCR were selected for standard DNA sequencing. 
Sequences were assembled and edited manually to produce 
the final viral genome sequences. Each full genome was 
deduced from a single specimen. 


Sequencing complete RdRp and S genes 


Some positive samples did not undergo complete genome 
sequencing because of limited amounts of sample. To in- 
crease the accuracy of subsequent phylogenetic analyses, 
we amplified the complete RdRp genes of four strains and 
the complete S genes of three strains, in addition to the 
complete genomes of six strains. Sequencing was performed 
using the primers available from the genomic sequencing, 
as previously described. The sequences of the PCR products 
were assembled manually to produce the complete RdRp 
and S gene sequences. 
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Genomic analysis 


The nucleotide (nt) sequences of the genomes and the de- 
duced amino acid (aa) sequences of the ORFs were pre- 
dicted using Vector NTI software (Invitrogen) or the ORF 
Finder tool of NCBI (http://www.ncbi.nlm.nih.gov/gorf/ 
gorf.html). Pairwise genome sequence alignment was con- 
ducted with EMBOSS Needle software (www.ebi.ac. 
uk/Tools/psa/emboss_needle/) using the default parameters. 
MEGAS.0 (Tamura et al., 2011) was used to align nt and 
deduced aa sequences with the MUSCLE package and de- 
fault parameters. The best substitution model was then 
evaluated using the Model Selection package implemented 
in MEGAS. Phylogenetic analyses were processed by the 
maximum-likelihood method with an appropriate model, to 
create phylogenetic trees with 1,000 bootstrap replicates 
(Guindon et al., 2010). Protein-family analysis was per- 
formed with PFAM (Bateman et al., 2002) and InterProScan 
(Apweiler et al., 2001). Predictions of transmembrane do- 
mains were performed with TMHMM (Sonnhammer et al., 
1998). 


Recombination analysis 


Recombinations among five genomes were detected with 
SimPlot software (version 3.5.1). We used a sliding window 
of 1,000 nt, which moved in steps of 300 nt, and applied the 
Genetic Algorithms for Recombination Detection program 
in the DataMonkey software package (http://www. 
datamonkey.org) (Kosakovsky Pond et al., 2006). When 
multiple breakpoints were detected between the 
non-recombinant and recombinant models, they were as- 
sessed by comparing the corrected Akaike’s Information 
Criterion scores. The Kishino-Hasegawa test was applied to 
verify if the adjacent sequence fragments yielded significant 
topological incongruence. 


Nucleotide sequence accession numbers 


All genome sequences have been submitted to GenBank. 
The accession numbers for the bat alpha-CoVs are 
KJ473795 to KJ473805. 
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